Customizing Metric Outputs using Output Templates

This interactive notebook guides model developers through the process of customizing the standard outputs produced by running a suite of ValidMind tests with the ValidMind Developer Framework. It uses the Bank Customer Churn Prediction sample dataset from Kaggle to train a simple classification model.

As part of the notebook, you will build on the simple quickstart_customer_churn notebook and learn how to:

  • Create an output template to customize the look and feel of the results produced by the ValidMind tests
  • Use output templates in your code to create one-off cusomized results
  • Add output templates to your documentation templates to save and share your customizations

Background

The ValidMind Developer Framework provides a suite of tests and metrics to help you evaluate the performance of your machine learning models. The out-of-the-box results are designed to be informative and easy to understand, but you may want to customize the look and feel of the results to better suit your needs. This might include things like removing or adding columns from the results, changing the formatting or structure of a table, or adding entirely new tables to the results. Output templates allow you to do all of these things and more. Please note that output templates are a new addition to the Developer Framework and are currently limited to creating and cutomizing tables but will be expanded to include other types of outputs in the future. They are written in HTML and use the Jinja2 templating language.

Key Concepts

  • Output Templates: Customizable HTML templates that define the look and feel of the results produced by the ValidMind tests. They are written in HTML and use the Jinja2 templating language.
  • Jinja2 Templating Language: A powerful templating language for Python that allows you to embed expressions and control structures in your HTML templates.
  • Customizing Tables: Output templates allow you to customize the look and feel of the tables produced by the ValidMind tests. This includes things like adding or removing columns, changing the formatting or structure of the table, and adding entirely new tables to the results.
  • Documentation Templates: Documentation templates are covered in the quickstart notebook and are the base for all model documentation. They are written in YAML and define the entire structure and content of a model’s documentation. Output templates are not part of the documentation template, but they are defined in and shared via a field in the documentation template.

How Documentation Template work with Output Templates

Below is a section of the standard Binary Classification Documentation Template that comes pre-installed with the ValidMind Developer Framework. The top-level model_evaluation section contains a list of sub sections which contain content blocks. These blocks can be either editable text blocks or test-driven blocks where the content_id identifies the threshold test or metric within the Developer Framework whose results will be displayed in that block. Now the key thing here is that each of these tests produces a specific output that is not directly editable from the ValidMind platform. This is where output templates come in, both figuraively and literally. They can be added as an optional field in the content block and will use the raw test output data in an HTML template to produce a custom table that can be displayed in the documentation.

Example Documentation Template

- id: model_development
  title: Model Development
  index_only: true
  sections:
    - id: model_training
      title: Model Training
      guidelines:
        - Describe the model training process, including the algorithm used, any
          hyperparameters or settings, and the optimization techniques employed
          to minimize the loss function or maximize the objective function.
        - ... (additional guidelines)
      contents:
        - content_type: metric
          content_id: validmind.model_validation.ModelMetadata
        - ... (additional content blocks)
      parent_section: model_development
    - id: model_evaluation
      title: Model Evaluation
      guidelines:
        - Describe the process used to evaluate the model's performance on a
          test or validation dataset that was not used during training, to
          assess its generalizability and robustness.
        - ... (additional guidelines)
      contents:
        - content_type: metric
          content_id: validmind.model_validation.sklearn.ConfusionMatrix
        - content_type: metric
          content_id: validmind.model_validation.sklearn.ClassifierPerformance
        - ... (additional content blocks)
      parent_section: model_development

In the above example, the validmind.model_validation.sklearn.ClassifierPerformance produces two tables like this:

Alt text

But with output templates, you can customize the look and feel of the output to produce a much simpler/clearer version like this:

Alt text

How this is accomplished is with the following output template:

- content_type: metric
  content_id: validmind.model_validation.sklearn.ClassifierPerformance:with_template
  output_template: |
    <table>
        <thead>
            <tr>
                <th>Accuracy</th>
                <th>Precision</th>
                <th>Recall</th>
                <th>F1 Score</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>{{ value["accuracy"] }}</td>
                <td>{{ value["weighted avg"]["precision"] }}</td>
                <td>{{ value["weighted avg"]["recall"] }}</td>
                <td>{{ value["weighted avg"]["f1-score"] }}</td>
            </tr>
        </tbody>
    </table>

As you can see, the output template is a simple HTML table that uses the Jinja2 templating language to embed expressions that reference the raw test output data. The {{ value["accuracy"] }} expression, for example, references the accuracy key in the raw test output data. This is how you can customize the look and feel of the results produced by the ValidMind tests.

Now that you understand the basics of output templates, the following sections will guide you through the process of creating and using them in your code.

Install the client library

The client library provides Python support for the ValidMind Developer Framework. To install it:

%pip install -q validmind
WARNING: You are using pip version 22.0.3; however, version 24.0 is available.
You should consider upgrading via the '/Users/andres/code/validmind-sdk/.venv/bin/python3 -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.

Initialize the client library

ValidMind generates a unique code snippet for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

  1. In a browser, log into the Platform UI.

  2. In the left sidebar, navigate to Model Inventory and click + Register new model.

  3. Enter the model details and click Continue. (Need more help?)

    For example, to register a model for use with this notebook, select:

    • Documentation template: Binary classification
    • Use case: Marketing/Sales - Attrition/Churn Management

    You can fill in other options according to your preference.

  4. Go to Getting Started and click Copy snippet to clipboard.

Next, replace this placeholder with your own code snippet:

# Replace with your code snippet

import validmind as vm

vm.init(
    api_host="https://api.prod.validmind.ai/api/v1/tracking",
    api_key="...",
    api_secret="...",
    project="...",
)
2024-04-10 17:23:41,968 - INFO(validmind.api_client): Connected to ValidMind. Project: [Int. Tests] Customer Churn - Initial Validation (cltnl29bz00051omgwepjgu1r)

Initialize the Python environment

Next, let’s import the necessary libraries and set up your Python environment for data analysis:

import xgboost as xgb

%matplotlib inline

Load the sample dataset

The sample dataset used here is provided by the ValidMind library. To be able to use it, you need to import the dataset and load it into a pandas DataFrame, a two-dimensional tabular data structure that makes use of rows and columns:

# Import the sample dataset from the library

from validmind.datasets.classification import customer_churn as demo_dataset

print(
    f"Loaded demo dataset with: \n\n\t• Target column: '{demo_dataset.target_column}' \n\t• Class labels: {demo_dataset.class_labels}"
)

raw_df = demo_dataset.load_data()
raw_df.head()
Loaded demo dataset with: 

    • Target column: 'Exited' 
    • Class labels: {'0': 'Did not exit', '1': 'Exited'}
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 619 France Female 42 2 0.00 1 1 1 101348.88 1
1 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0
2 502 France Female 42 8 159660.80 3 1 0 113931.57 1
3 699 France Female 39 1 0.00 2 0 0 93826.63 0
4 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0

Document the model

As part of documenting the model with the ValidMind Developer Framework, you need to preprocess the raw dataset, initialize some training and test datasets, initialize a model object you can use for testing, and then run the full suite of tests.

Prepocess the raw dataset

Preprocessing performs a number of operations to get ready for the subsequent steps:

  • Preprocess the data: Splits the DataFrame (df) into multiple datasets (train_df, validation_df, and test_df) using demo_dataset.preprocess to simplify preprocessing.
  • Separate features and targets: Drops the target column to create feature sets (x_train, x_val) and target sets (y_train, y_val).
  • Initialize XGBoost classifier: Creates an XGBClassifier object with early stopping rounds set to 10.
  • Set evaluation metrics: Specifies metrics for model evaluation as “error,” “logloss,” and “auc.”
  • Fit the model: Trains the model on x_train and y_train using the validation set (x_val, y_val). Verbose output is disabled.
train_df, validation_df, test_df = demo_dataset.preprocess(raw_df)

x_train = train_df.drop(demo_dataset.target_column, axis=1)
y_train = train_df[demo_dataset.target_column]
x_val = validation_df.drop(demo_dataset.target_column, axis=1)
y_val = validation_df[demo_dataset.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)
XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, early_stopping_rounds=10,
              enable_categorical=False, eval_metric=['error', 'logloss', 'auc'],
              feature_types=None, gamma=None, gpu_id=None, grow_policy=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_bin=None, max_cat_threshold=None,
              max_cat_to_onehot=None, max_delta_step=None, max_depth=None,
              max_leaves=None, min_child_weight=None, missing=nan,
              monotone_constraints=None, n_estimators=100, n_jobs=None,
              num_parallel_tree=None, predictor=None, random_state=None, ...)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Initialize the ValidMind datasets

Before you can run tests, you must first initialize a ValidMind dataset object using the init_dataset function from the ValidMind (vm) module.

This function takes a number of arguments:

  • dataset — the raw dataset that you want to provide as input to tests
  • input_id - a unique identifier that allows tracking what inputs are used when running each individual test
  • target_column — a required argument if tests require access to true values. This is the name of the target column in the dataset
  • class_labels — an optional value to map predicted classes to class labels

With all datasets ready, you can now initialize the raw, training and test datasets (raw_df, train_df and test_df) created earlier into their own dataset objects using vm.init_dataset():

import validmind as vm

vm_raw_dataset = vm.init_dataset(
    dataset=raw_df,
    input_id="raw_dataset",
    target_column=demo_dataset.target_column,
    class_labels=demo_dataset.class_labels,
)

vm_train_ds = vm.init_dataset(
    dataset=train_df, input_id="train_dataset", target_column=demo_dataset.target_column
)

vm_test_ds = vm.init_dataset(
    dataset=test_df, input_id="test_dataset", target_column=demo_dataset.target_column
)
2024-04-10 17:23:42,147 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-04-10 17:23:42,360 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-04-10 17:23:42,407 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...

Initialize a model object

Additionally, you need to initialize a ValidMind model object (vm_model) that can be passed to other functions for analysis and tests on the data. You simply intialize this model object with vm.init_model():

vm_model = vm.init_model(
    model,
    input_id="model",
)

Assign predictions to the datasets

We can now use the assign_predictions() method from the Dataset object to link existing predictions to any model. If no prediction values are passed, the method will compute predictions automatically:

vm_train_ds.assign_predictions(model=vm_model)
vm_test_ds.assign_predictions(model=vm_model)
2024-04-10 17:23:43,049 - INFO(validmind.vm_models.dataset): Running predict()... This may take a while
2024-04-10 17:23:43,050 - INFO(validmind.vm_models.dataset): Running predict()... This may take a while

Run individual tests and customize the results

Instead of running the full suite of tests, you can run individual tests and metrics. This is useful for experimentation and when exploring and building output templates to create custom results. Lets go ahead and run a single test, the ClassifierInSamplePerformance metric, and see how we can create fully customized results from the output using output templates.

from validmind.tests import run_test

First, let’s run the test as normal and see the standard output:

result = run_test(
    test_id="validmind.model_validation.sklearn.ClassifierPerformance",
    inputs={
        "dataset": vm_train_ds,
        "model": vm_model,
    },
)

Let’s also take a look at the result object that is returned when running the test and see how we can grab the raw metric value from it to start developing our output template:

import json

print("In Sample Performance Raw Value:")
print(json.dumps(result.metric.value, indent=2))
In Sample Performance Raw Value:
{
  "0.0": {
    "precision": 0.8958233317330773,
    "recall": 0.9756862745098039,
    "f1-score": 0.9340508071580528,
    "support": 3825.0
  },
  "1.0": {
    "precision": 0.8533123028391167,
    "recall": 0.5548717948717948,
    "f1-score": 0.6724673710379117,
    "support": 975.0
  },
  "accuracy": 0.8902083333333334,
  "macro avg": {
    "precision": 0.874567817286097,
    "recall": 0.7652790346907994,
    "f1-score": 0.8032590890979823,
    "support": 4800.0
  },
  "weighted avg": {
    "precision": 0.8871882789889916,
    "recall": 0.8902083333333334,
    "f1-score": 0.8809166716961492,
    "support": 4800.0
  },
  "roc_auc": 0.7652790346907995
}

This is the raw value object that will get passed into the output template and accessible just with the value variable name. Now let’s go ahead and create a simple output template like in the example and then see how we can test it directly against the result object:

output_template = """
<table>
    <thead>
        <tr>
            <th>Accuracy</th>
            <th>Precision</th>
            <th>Recall</th>
            <th>F1 Score</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>{{ value["accuracy"] }}</td>
            <td>{{ value["weighted avg"]["precision"] }}</td>
            <td>{{ value["weighted avg"]["recall"] | number }}</td>
            <td>{{ value["weighted avg"]["f1-score"] | number }}</td>
        </tr>
    </tbody>
</table>
"""
# specifically notice how the values from the result are being accessed inside the template
# also notice that we can use filters to format the values e.g. `| number` to format the number to 4 decimal places
# we can immediately re-render while trying different output templates
result.render(output_template=output_template)

And, there you go, you have successfully created and used a custom output template. Try making some changes to the template html and see how it affects the output. You can also add more complex logic and control structures to the template using the Jinja2 templating language here.

Now that you have a working output template, it can also be passed right into the run_test function to produce the same results as before:

result = run_test(
    test_id="validmind.model_validation.sklearn.ClassifierPerformance",
    inputs={
        "dataset": vm_train_ds,
        "model": vm_model,
    },
    output_template=output_template,
)

Awesome! So you have seen how to create and use output templates when running individual tests. In a real-world scenario though, you would want to add the output template to the documentation template so that it can live there as a permanent customization. This is what we will cover next.

Run the full suite of tests with output templates

Now that you’ve seen how to run an individual test and customize the output, let’s see how you can apply that concept to model documentation by adding the output template to the documentation template and running the full suite of tests.

Add the output template to the documentation template

First, go to your project in the ValidMind UI and go to Settings > Templates. Find the Binary Classification Template that is used by the Customer Churn project. Click on the Edit button to bring up the template editor. Then, add the following content block below the existing validmind.model_validation.sklearn.ClassifierPerformance metric:

- content_type: metric
  content_id: validmind.model_validation.sklearn.ClassifierPerformance:with_template
  output_template: |
    <table>
        <thead>
            <tr>
                <th>Accuracy</th>
                <th>Precision</th>
                <th>Recall</th>
                <th>F1 Score</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>{{ value["accuracy"] }}</td>
                <td>{{ value["weighted avg"]["precision"] }}</td>
                <td>{{ value["weighted avg"]["recall"] }}</td>
                <td>{{ value["weighted avg"]["f1-score"] }}</td>
            </tr>
        </tbody>
    </table>

This will add a second version of the ClassifierPerformance metric so we can compare the standard output with the custom output.

Run the full suite of tests

Now that you’ve added the output template to the documentation template, you can run the following code cells to initialize the client which retrieves the template. Then you can run the full suite of tests to see the custom output in the documentation and on the ValidMind UI.

full_suite = vm.run_documentation_tests(
    section=["model_development"],
    inputs={
        "dataset": vm_test_ds,
        "datasets": (vm_train_ds, vm_test_ds),
        "model": vm_model,
    },
)
Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)

Next steps

Now that you’ve seen how to create and use output templates, you can take a closer look at the results produced by the full suite of tests in the ValidMind UI.

  1. In the Platform UI, go to the Documentation page for the model you registered earlier.

  2. Take a look at the Model Development section to see the results of the tests you just performed.

You should see two versions of the ClassifierInSamplePerformance metric. One will be the standard output and the other will be the custom output produced by the output template. You can compare the two to see how the output template has customized the look and feel of the results.


If you want to learn more about where you are in the model documentation process, take a look at How do I use the framework?.